perm filename PROGS.JAM[UP,DOC] blob
sn#314034 filedate 1977-11-02 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00039 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00007 00002 STANFORD PROGRAMS 1 INTRODUCTION
C00009 00003 STANFORD PROGRAMS 2 ABOUT HEADERS
C00013 00004 STANFORD PROGRAMS 3 ABOUT HEADERS
C00015 00005 STANFORD PROGRAMS 4 ANALYSIS PROGRAMS
C00018 00006 ANALYSIS PROGRAMS 5 S
C00021 00007 ANALYSIS PROGRAMS 6 S
C00024 00008 ANALYSIS PROGRAMS 7 S
C00027 00009 ANALYSIS PROGRAMS 8 S
C00031 00010 ANALYSIS PROGRAMS 9 S
C00034 00011 ANALYSIS PROGRAMS 10 S
C00036 00012 ANALYSIS PROGRAMS 11 S
C00037 00013 ANALYSIS PROGRAMS 12 HANAL
C00040 00014 ANALYSIS PROGRAMS 13 HANAL
C00041 00015 ANALYSIS PROGRAMS 14 PVCOMP
C00044 00016 ANALYSIS PROGRAMS 15 PVCOMP
C00048 00017 ANALYSIS PROGRAMS 16 DFSYN
C00051 00018 ANALYSIS PROGRAMS 17 DFSYN
C00054 00019 ANALYSIS PROGRAMS 18 FLTCMP
C00058 00020 ANALYSIS PROGRAMS 19 FLTCMP
C00060 00021 ANALYSIS PROGRAMS 20 PITCH
C00063 00022 ANALYSIS PROGRAMS 21 PITCH
C00064 00023 ANALYSIS PROGRAMS 22 PDF
C00067 00024 PROCESSING PROGRAMS 23 INTRODUCTION
C00069 00025 PROCESSING PROGRAMS 24 LPS
C00072 00026 PROCESSING PROGRAMS 25 LPS
C00075 00027 PROCESSING PROGRAMS 26 ENORM
C00077 00028 PROCESSING PROGRAMS 27 FLTAPP
C00080 00029 PROCESSING PROGRAMS 28 FLTAPP
C00083 00030 PROCESSING PROGRAMS 29 SRCONV
C00086 00031 PROCESSING PROGRAMS 30 HEADER
C00087 00032 DISPLAY PROGRAMS 31 INTRODUCTION
C00089 00033 DISPLAY PROGRAMS 32 EXMRG
C00091 00034 DISPLAY PROGRAMS 33 AFPIX
C00093 00035 EDITING PROGRAMS 34 INTRODUCTION
C00095 00036 EDITING PROGRAMS 35 FUNED
C00098 00037 EDITING PROGRAMS 36 FUNED
C00102 00038 EDITING PROGRAMS 37 FUNED
C00106 00039 EDITING PROGRAMS 38 FUNED
C00111 ENDMK
C⊗;
STANFORD PROGRAMS 1 INTRODUCTION
STANFORD PROGRAM LIBRARY
Or: How to Feel Like you can Do something
Without really having to
This document describes, the programs (not subroutines) from the Stanford
Computer Music Group. These include programs for editing and mixing of sound
files, for doing several kinds of analysis on sound files, and some types of
synthesis also.
STANFORD PROGRAMS 2 ABOUT HEADERS
ABOUT HEADERS
Sound files either have headers or they don't. A header is a 128-word block
that tells several things about the file: what the sampling rate is, how is it
packed (12 bits to a sample, 18 bits to a sample, or 36-bit floating point),
how many channels it has (1 to 4 currently), what the maximum amplitude is,
and can contain a text comment also.
Most programs call the same subroutine to read a sound file name for input.
This subroutine looks at the sound file and sees whether it has a header or
not. If it does have a header, it returns the header data to the calling
program and prints out the comment text. If it does not have a header, then it
interrogates the user directly. Usually it does so by asking three questions:
what the sampling rate is, what the packing is, and how many channels is it.
The sampling rate is usually defaulted to 25,600 Hz if you just type C.R.
(carriage return), the packing to 12-bit integer, and the number of channels
to monaural. Generally for the sampling rate, you can just type the number of
kHz, and if it is less than 100, it will automatically be multiplied by 1000
to convert to Hz. For the packing, 0 usually stands for 12-bit, 1 for 18-bit,
and 3 for 36-bit floating point.
If your sound file does not have a header on it and you wish it did, you can
run a program called HEADER and put one on it.
There are at least two formats for data, such as analysis or synthesis data.
There are merge files (sometimes abbreviated MRG files) which are (often)
large files of binary data. They consist of a number of separate sampled-data
functions, each with its own name, length, and possibly sampling rate. There
are programs that examine merge files (EXMRG) and display the various
functions, there are programs that can read and edit the various functions
(FUNED), and there are subroutines for dealing with merge files (MRGPAK).
There is also the SEG-type file. This is a text file that can be read into the
STANFORD PROGRAMS 3 ABOUT HEADERS
NEWMUS compiler. It is used for piecewise-linear functions. There exist
programs for editing these (FUNED) as well as converting between this form and
MRG file form. Most of the heavyweight analysis programs produce MRG files as
output. Most of the music synthesis programs take SEG-type functions as input.
All of the sources for these programs are on [MA,JAM] on the MUSIC0 UDP. They
use library routines which may be found in JAMLIB.REL[SUB,SYS], the sources of
which are on [LIB,JAM] on said UDP. They also use MRGPAK which is a package
for binary data file manipulation the source of which is also on [MA,JAM].
STANFORD PROGRAMS 4 ANALYSIS PROGRAMS
ANALYSIS PROGRAMS
The analysis programs come in several varieties.
There is S, which is a general purpose interactive analysis program, capable
of viewing sound files and taking discrete Fourier transforms of subsections.
This is good when you are just sort of curious about what is going on in a
particular sound file.
There is HANAL, which is useful for people working with MUSIC V or NEWMUS.
This is primarily designed for single tones from either orchestral instruments
or short vocal tones. The program computes the amplitude envelope of the tone,
then at the maximum amplitude, it takes the Fourier transform. Its output is a
SEG type function that can be read directly into NEWMUS as a function. This is
useful for trying to simulate a given tone in NEWMUS.
There is PVCOMP, which is not the least bit interactive. It does the phase
vocoder analysis, which is a time-variant discrete Fourier transform. This
program is coupled with DFSYN, which takes this data and turns it into
magnitude-frequency form. This latter form is the most useful for intuition
and for synthesis. DFSYN can also resynthesize the tone either at the original
pitch or at some other pitch.
There is FLTCMP, which computes the linear prediction coefficients for an
entire sound file. It does so by taking a fixed window and stepping this
window through time by equal increments. The method used is Burg's maximum
entropy method. This joins with FLTAPP, for applying the filter thus computed
to a sound file.
PITCH computes the pitch of a sound file by autocorrelation. It steps through
the sound file and gets the pitch at each point in time. You must bound the
search by supplying the range in which the pitch will most likely lie.
ANALYSIS PROGRAMS 5 S
S - AN INTERACTIVE BROWSING PROGRAM
Commands to S are single character commands with control, meta, or both
depressed. We will abbreviate below control by the character α, meta by β,
and control-meta by αβ. Most commands then ask various questions about what
is desired. The first thing you must do is specify a file name. This is done
by αI. It will ask you for a file name.
All commands that take values, like αB or αE below, will accept <RETURN> as
meaning "dont change the value after all" and will leave the number unchanged.
αI Set an input file.
αβI Close the input file.
αO Open output file
αC Copy input file to output file (closes output file)
This will copy the input file from the time specified by the begin time (see
αB) below to the end time (see αE) below. Unless you specify otherwise, it
will always put a header on the output file (use the Xtend command -OH to
prevent it).
αB Set begin time
αE Set end time
On setting times in S: Times are given in seconds. In addition, times can be
given in samples by preceeding the decimal sample number with the letter "S".
For instance, the time 1.0 and S25600 correspond to the same sample if the
sampling rate is 25.6 KHz. The time value is not converted to sample number
until the last instant, so if you change sampling rates after setting begin
and end times, these times will still be valid.
ANALYSIS PROGRAMS 6 S
One can also type ∞, which stands for the sample number of the last sample in
the file.
αS Show waveform between begin time and end time
βS Advance times by 75 percent of their difference and show
αβS Move back above amount and show
λ( Move right 2↑(λ/2) units and Show
λ) Move left 2↑(λ/2) units and Show
λ/ Make window smaller by 2↑(λ/2)
λ\ Make window bigger by 2↑(λ/2)
This business is a way to specify a number along with the command. That is,
the "λ" above is a number that is the binary number specified by the control,
meta, and top keys. α is 1, β is 2, αβ is three, top-α is 4, top-β is 5, top-
αβ is 6. Thus if you type "[" instead of "(", or "]" instead of ")", it
includes effectively three more into the total.
All the commands that display the sound file produce a file on the disk that
contains the picture itself. The file name is always SIG.PLT You can look at
this file by using the αD command below - that command displays a plot file.
You can rename the plot file to something else using the βN command below.
αF Take DFT of sound within window (between begin and end times)
βF Take Cepstrum of sound within window
αβF Take Autocorrelation of sound within window
Various window functions can be applied while doing these transforms. The
window function options can be changed with the Xtend commands SWINDOW and
TWINDOW.
All these transforming routines produce a plot file called FFT.PLT - in
addition, they produce a "pseudo-sound" file called FFT.PS. The purpose of
ANALYSIS PROGRAMS 7 S
this creation is to allow you to read it in, just like a sound file, with αI,
then to look at it more closely with α/ or αS as you will. The horizontal axis
for the FFT is in KHz (not Hz), so the beginning or ending "times" you specify
for αS are actually in this case in terms of KHz.
αL Filter a file
You get your choice of lo-pass, hi-pass, band-pass, and band-stop filters or
either Butterworth, or Chebychev characteristics. Dont forget that you must
set the begin and end times of the input file so the filtering will take some
non-zero portion of the input file.
Since filtering a file does somewhat unpredictable things to the amplitude, it
is usually the best idea to set the output packing mode to 36-bit floating
point to avoid amplitude over or underflow. You can later convert the floating
point to 12-bit integer using the copy command, αC. It will also renormalize
the file to exactly -1 to +1 in amplitude.
Note that one very handy use of the filtering routine is to remove 50-Hz hum.
To do this, you use a bandstop filter from maybe 40 to 60 Hz, or even a hi-
pass filter from 60 Hz on up.
αP Apply optimum-comb algorithm
αD Display a plot file
βD Display it again
αM Computer the maximum value of a sound file
αN Rename a file
βN Rename the signal plot file SIG.PLT
αβN Rename the FFT file FFT.PLT
αR Reverberate a sound file
αH or α? HELP. Types out list of commands
ANALYSIS PROGRAMS 8 S
αX is the "extended" command. It answers "yes? " and expects you to type a 1-
to-6 letter command. the commands are as follows:
ICLOCK Set input clock rate
OCLOCK Set output clock rate
CLOCK Set both clock rates
INCHAN Set number of channels on input file
ONCHAN Set number of channels on output file
NCHAN Set number of channels on both files
AUTOIN Turn on automatic file name incrementing mode
IPACK Set input packing mode
OPACK Set output packing mode
PACK Set both packing modes
OHEADE Make output file have a header
TEXT Set output header text
PBOOL Print out values of all boolean variables
CHANNE Set current channel <not used>
DMODE Set display mode (average, sample, direct, envelope)
Display mode applies to all time function displays (rather than transforms).
Normal, average, and sample refer to what to do if there are more points in
the window (between the begin and end times) than there are on the screen
(1024 maximum). AVERAGE says display the average of the points around the
point in question. This is the default mode and is usually the right thing,
except that for very large windows, it will give the appearence of attenuating
the signal. SAMPLE will just resample the signal at the proper rate to get
1024 points on the screen. This preserves the original amplitude spread, but
sometimes leaves out important points, especially if the sound being displayed
is full of spikes and impulses. DIRECT mode displays all the points anyway
with no data reduction at all. This usually gets display errors for exceeding
the maximum buffer size. ENVELOPE takes the maximum over each group of points
(the number of points in a group is specified by the WINDOW Xtend command
below. The default is 750 points I think). This is really only useful in
ANALYSIS PROGRAMS 9 S
conjunction with ABMODE, with says to take the absolute value of the sound
file first. The combination of ABMODE and ENVELOPE display mode gives the
amplitude envelope of the sound waveform.
ABMODE Make function non-negative before displaying
PMODE Makes display use endpoint vectors
WINDOW Sets averaging window width
DPCHAN Sets which channel of multi-channel input file to display
NOXAXI Turns off display of x-axes on all displays
NOYAXI Turns off display of y-axes on all displays
WFLAG Specifies how FFT window relates to begin time & end time
SWINDOW Set windowing function to use on sound wave (for FFT)
TWINDOW Set windowing function to use on transform (AUTOC & CEP)
LOGF Makes FFT display of LOG of xfm
SELECT Select input file
DPSCALE Set display scaling
SLAM Slam bottom of display to zero
CUTOFF Set reverberator output cutoff
REVTIM Set reverberation output file length
Some of these things specify values and others are just true or false (boolean
variables). The Booleans are AUTOIN, OHEAD, ABMODE, PMODE, NOXAXIS, NOYAXIS,
LOGF. You can print out the values of all the boolean variables with the PBOOL
command.
All of these Xtended commands can be abbreviated to the smallest unique set of
letters. For instance, ICLOCK and OCLOCK may be abbreviated al IC and OC.
Since reverberation is more complicated, we have reserved its explaination for
down here. The αR command will either filter an input file with a reverberator
of your choise, or will just produce the impulse response of a reverberator.
Eventually, it will come down to specifying the details of the reverberator.
You usually do so in a specification file. Such a file will accept the
following commands:
ANALYSIS PROGRAMS 10 S
Conventions are:
"i" stands for an integer, as does "j"
"x" is a floating-point number
all commands are terminated by CRLF
Commands are:
NCOMBS=i Sets Number of parallel comb filters to i
CMBLEN[i]=j Sets delay of comb i to j samples
CMBG[i]=x Sets gain of comb i to x
N1ALPS=i Sets number of series 1-pole allpasses to i
ALP1L[i]=j Sets delay of 1-pole allpass i to j samples
ALP1G[i]=x Sets gain of 1-pole allpass i to x
N2ALPS=i Sets number of 2-pole allpassses to i
ALP2L[i]=j Sets delay of 2-pole allpass i to j samples
ALp2F[i]=x Sets oscillation frequency of 2p allpass i to x
ALp2G[i]=x Sets decay of 2p allpass i to x
DIRECT=x Sets fraction direct sound to x
PRINT Print out the values of all the parameters so far
EXIT Go on to computation
HELP Print this message
ABORT Get out of REV gracefully, back to command loop
For instance, this is a typical reverberation file and can be used directly to
set up an alpass reverberator for you:
NCOMBS=0
N1ALPS=5
N2ALPS=0
ALP1L[1]=1597
ALP1G[1]=.750
ALP1L[2]=1117
ALP1G[2]=.720
ALP1L[3]=787
ALP1G[3]=.692
ANALYSIS PROGRAMS 11 S
ALP1L[4]=509
ALP1G[4]=.671
ALP1L[5]=331
ALP1G[5]=.651
DIRECT=.85
EXIT
ANALYSIS PROGRAMS 12 HANAL
HANAL - QUICKIE ANALYSIS FOR ENVELOPE, SPECTRUM, AND PITCH
This program is for people using MUSCMP or MUSIC V who just have a sound file
of a single note and want to capture the amplitude envelope of the note as
well as the heights of the harmonics at some single point in the file. This is
a very simple-minded program.
This program requires the use of graphics, so you must be on a PDP-11 graphics
terminal to use it.
R HANAL
Input SND file: <type name of sound file containing single note>
Output MRG file: <make up a MRG file name here>
Function name in MRG file: <can use same name as file name>
Estimated fundamental frequency: <your best guess. Not critical>
<here it puts up a picture of the envelope>
<after viewing the picture, type C.R. to continue>
do envelope approximation? <if interested in envelope, type Y>
<shows you pictures of envelope and approximation>
FFT will be taken at .1013 <type C.R. to accept this time>
Is this OK? <this is where it will take FFT>
<puts up picture. Type Y to accept time>
After this, it is all automatic. The procedure is this. After you tell it your
best guess at the fundamental frequency, it puts up a picture of the amplitude
envelope. After you get tired of staring at this picture, you type C.R. It
will then start doing piecewise-linear approximations to the envelope. It will
do 20 approximations, putting up a picture after each one. After it finishes
with that, it will type out the time of the maximum point of the envelope. At
this point, you should probably type C.R. It will then put up a picture of the
actual waveform centered around that point. If you think that is a good place
to take an FFT (nice and regular) then type "Y" and it will go on. Otherwise,
ANALYSIS PROGRAMS 13 HANAL
type C.R., and it will ask you for the time to take the FFT. You can go
through this as many times as you like. Once is usually enough for most folks.
After this, the program exits. There will be a file <name>.TXT on your area
that contains the piecewise-linear approximations, the harmonic amplitudes,
and the pitch of the signal. The name of the file will be the same as that of
the MRG file. There will also be a MRG file with the entire envelope in it.
ANALYSIS PROGRAMS 14 PVCOMP
PVCOMP - PHASE VOCODER ANALYSIS
The "Phase Vocoder" is somewhat of a misnomer. It refers to a technique for
doing time-varying Fourier-like analysis on continuous sound. We can think of
the process as applying a bank of bandpass filters to the input signal. The
channel filters are all identical. They are each formed from a single
prototype filter that is shifted to equally spaced points across the sampling
rate by heterodyning. Each filter has a complex impulse response, so the
output of each channel is in phase quadrature. Further information can be
obtained by reading "Realization of the Phase Vocoder" by Michael Portnoff,
IEEE Trans. on Audio, Speech, and Signal Processing, June, 1976, and "Use of
the Phase Vocoder in Computer Music Applications" by James Moorer, Audio
Engineering Society preprint, October, 1976.
Anyway, this program, combined with DFSYN, and other display programs, can
give time-variant pictures of the spectra of sound files. Each channel can be
viewed separately or together in perspective.
You use the program as follows:
R PVCOMP:
Input SND file: <type input file name here>
Reading from file xxxxx
Output PV file: <type output file name here>
Writing on file xxxxx
<types out file header information here>
Fundamental (or lowest) frequency: <see note below>
Number of windows/2 in average: <see note below>
After this, the program will compute for a long time (about 1 minute for every
second of sound) and will eventually exit. At this time, you will have an
output PV file, called whatever you typed for the answer to the second
ANALYSIS PROGRAMS 15 PVCOMP
question. Probably the next thing you want to do is use DFSYN to turn this
into a magnitude-frequency form.
If the sound file is only one pitch (like an isolated tone from an
instrument), then you should find the pitch with S (use the DFT feature) and
type in exactly the fundamental frequency here. On tones with vibrato, you can
type roughly the center frequency. In these cases, channels will correspond to
harmonics.
If the sound file has highly variable pitch, like in normal speech, then you
must type the lowest pitch the sound achieves for this question. This will
assure that each channel with capture no more than one harmonic. There may
thus be channels with no signal present, but this is OK.
Number of windows in the average. This determines how sharp the filters will
be. A large number here (like 4, 6, or 20) will make very sharp filters, but
they will extend over a great period of time. I have been favoring small
numbers lately, like for instance, 4 or so seem to work well.
The output of this program is a merge file (sometimes abbreviated as a MRG
file). Presumably, the format of a merge file is described in an appendix
somewhere.
The number of channels that will be used will be one half of the sampling rate
divided by the fundamental (or lowest) frequency (rounded to the nearest
integer) plus one. For instance, with a sampling rate of 25,600 Hz, and a
minimum pitch of 64 Hz, there will be 201 channels, spaced exactly 64 Hz
apart, from 0 to 12,800 Hz. The length of each output function is determined
by the length of the input file and the number of channels. The length of each
function will be the length of the sound file, in seconds, times the
fundamental (or lowest) frequency, times two.
In the MRG file, the functions will be named "REAL.n" and "IMAG.n" where n is
the channel number from 0 to the maximum.
ANALYSIS PROGRAMS 16 DFSYN
DFSYN - PHASE VOCODER MAGNITUDE-FREQUENCY CONVERSION
When PVCOMP has finished producing a file of analysis data that represents the
outputs of a number of bandpass filters that are equally spaced in frequency
and converts this to the amplitude and frequency form. If what is coming out
of a given bandpass filter is a pure sinusoid (like a harmonic), then the
frequency will be the frequency of that sinusoid and the amplitude will be the
amplitude of the sinusoid. If what is coming through a given channel is not a
pure sinusoid (like is two or more sinusoids together, or is pure noise), then
the magnitude and frequency functions will not appear to make any sense.
This program can do either of two things - it can do the conversion and write
the amplitudes and frequencies out on another merge file, or it can use the
amplitudes and frequencies to synthesize a tone, possibly multiplying all the
frequencies by a constant factor, then synthesizing a signal from this
information. This works well for simple sounds, but for (for instance) low
pitched resonant male voices, it doesn't work so well.
The program is operated like this:
R DFSYN
Input PV file: <type PV file name here>
Output MF file: <type output file name here, or C.R. if not desired>
Output sound file: <type output sound file name, or C.R.>
Output compression ratio: <see below>
Intermediate compression ratio: <see below>
Maximum channel number: <see below - C.R. works here>
Make new merge file? <always answer "yes" here>
Frequency Multiplier: <ratio or C.R. for 1.0>
At this point, the program will execute for a long, long time, ocaisionally
burping out bits of text to tell you where it is. There are some general
ANALYSIS PROGRAMS 17 DFSYN
rules for deciding how to set the various compression ratios. Ideally, we
would set both the output compression ratio and the intermediate compression
ratio to 1. What this means is that each output function (amplitude and
frequency) would have as many points as the original sound file. This gives
the best fidelity, but takes up a great great deal of disk space. Since it is
not practical to store this entirely on the disk, we must compress it a bit.
If you give an output compression ratio of 16, then there will be one output
point (per channel) for each 16 input points. The intermediate compression
ratio says how many points will be used internally in the program. If you are
doing synthesis, this will work best if it is set to 1, but other values can
sometimes give reasonable results with much less computer time. If you are
just preparing output for a merge file, there is no reason to make the
intermediate compression ratio anything other than identical to the output
compression ratio.
Another annoying fact is that these compression ratios must divide integrally
into the number of channels (minus one), and that the intermediate compression
ratio must divide integrally into the output compression ratio.
When you are done with this program, you can produce a series of PLT files of
the amplitudes and frequencies by using AFPIX. (or even EXMRG). The functions
in the merge file are named XXXX.An or XXXX.Fn where the XXXX is the file name
of the input PV file and n is the channel number.
In response to the "maximum number of channels" question, you probably want to
say C.R., which automatically sets it to as many as possible. You can set it
to less than this if you don't think you need the upper channels.
ANALYSIS PROGRAMS 18 FLTCMP
FLTCMP - LINEAR PREDICTOR FILTER COMPUTATION
This program is part of a package for doing work with linear predictive coding
that consists of the programs FLTCMP for computing the predictor coefficients,
FLTAPP to apply the filter to a sound file, PITCH to track the pitch of a
sound file, PDF to make the voiced-unvoiced decision and the silence-sound
decision for a sound file, LPS to synthesize from the pitch, various
decisions, and filter coefficients, and ENORM to normalize the energy of the
resulting synthesis to correspond to the original signal (or anything else).
To use FLTCMP, you do as follows:
R FLTCMP
Input SND file: <type sound file name>
Output K file: <type desired output file name>
Ms between covariance calculations: <type number, like 5, here>
Width of covariance window in Ms: <type number here, like 25>
Order of filter: <see below>
Autocorrelation method? <I favor Y these days>
Now the program will compute for a long time and then stop. It types an
asterisk after every 100 steps. As to what numbers you should put where, the
time between calculations should be as short as possible without compromising
efficiency. A useful number is 5 milliseconds. You just type 5 to the program.
You might try 2 or 1 if you really want faithful tracking, or perhaps 10 or 15
if the exact quality is not so important. As for the width of the covariance
window, for periodic sounds, it should be at least two periods wide (without
knowing any more - there are all sorts of subtle considerations here that I
shall ignore for the time being). For high-pitched sounds, however, this leads
to inordinately short windows. The window should be at least 50 percent wider
than the step size (ms between calculations). The order of the filter is a
complicated thing. If you want it to exactly mimic the sound, pitch and all,
you should have at least 2 orders for every harmonic. This will produce a
ANALYSIS PROGRAMS 19 FLTCMP
filter that will contain both spectral information as well as pitch
information. If you just want it to track the spectral information and not the
pitch information, you should set the order to somewhat less than that. For
low pitched voices, I have used 35 with reasonable success. For high voices,
16 seems to be more reasonable.
It normally does the Burg (harmonic mean) method of linear prediction, but it
also does the standard autocorrelation method, which is somewhat quicker. The
results are very similar, but often the autocorrelation is a bit smoother on
the whole.
ANALYSIS PROGRAMS 20 PITCH
PITCH - COMPUTE PITCH OF SOUND FILE
This program computes the pitch of a sound file using straight autocorrelation
with a smattering of statistical decision theory thrown in. It does various
error correction and smoothings on the pitch contour, but still manages to
make errors every now and then. It produces both a MRG file and a text file.
The text file contains a SEG-type description of the pitch contour, but scaled
to a maximum of 1.0. Since the maximum pitch detected is printed out, the
original pitch contour may be recovered by multiplying by this maximum.
It is used like this:
R PITCH
Input SND file: <Type sound file to analyse>
Output P file: <Make up a name. May be same as sound file>
Output TXT file: <Make up a name. Also may be same>
Debug mode: <Type C.R. - Otherwise will do display>
Minimum Frequency: <Type min frequency in Hertz>
Maximum Frequency: <Type maximum frequency in Hertz>
MS between Computations: <5 is a good number>
Correlation threshold: <Type C.R.>
After it terminates, you will have a P file which is in MRG format, and a TXT
file that is in SEG format. There will be no pitches reported outside of the
range you gave for minimum and maximum frequencies, so you better be sure they
are correct. The time between computations is not terribly critical. The less
the better. As low as 1 or 2 Ms takes quite a bit of computer time, but
provides about the smoothest tracing of pitch. The correlation threshold is
the minimum correlation the signal may have. If it has any less correlation
than this, it is assumed to be totally inharmonic, and the pitch reported is
arbitrary. What it does is just linearly interpolate between the last and next
valid pitches.
ANALYSIS PROGRAMS 21 PITCH
Inside the MRG file, there will be now three functions with the extensions P,
C, and E, for pitch, correlation, and energy.
ANALYSIS PROGRAMS 22 PDF
PDF - DO VOICED/UNVOICED DECISION
This program takes a P file produced by PITCH and decides which parts of the
original file were voiced (had a definite pitch) and which parts were unvoiced
(had no pitch center). It also decides which parts are silence and which parts
are not silence. This is all the lead-in to a linear prediction synthesis
program, LPS, that is described next.
To use the program:
R PDF
Input P file: <type name of file produced by PITCH>
Function name in P file: <probably same as file name>
Correlation Threshhold: <Type C.R. to be safe>
Energy Threshold: <Also C.R.>
The decision criterion used is simple thresholding on the energy and on the
correlation coefficient. If the energy is above a certain amount, the
utterance is declared not to be silence. If the energy is high enough and if
the correlation is high enough, it is declared to be voiced, otherwise
unvoiced. The first run, these thresholds should be left as is by typing C.R.
If you think the results can be improved by changing the thresholds, feel free
to experiment here. A higher correlation threshold means that there will be
less voiced signal. A higher energy threshold means there will be more
silence.
This writes two more functions onto the P file with extensions PH and NH. What
these are are numbers between 0 and 1 that specify the strength (height) of
the voiced excitation and the height of the noise excitation. These two
excitations are then added together in LPS.
PROCESSING PROGRAMS 23 INTRODUCTION
INTRODUCTION TO PROCESSING PROGRAMS
Or: How to munge your bits
The processing programs include routines for doing linear prediction
synthesis, energy normalization, sound file mixing, filtering, and many more.
We have already discussed S, which has some processing functions as well as
analysis functions.
LPS does linear prediction synthesis from data produced by FLTCMP, PITCH, and
PDF (or any editing of these data). ENORM makes the energy of the resulting
synthetic sound correspond to some degree. FLTAPP is useful for doing cross-
synthesis (like Tracy Petersen's work), producing the error function, or just
whitening a signal. MIXSND is the sound file mixer and is probably one of the
most important sound processing programs. It is capable of overlapping
multiple copies of various sound files all simultaneously. It can extract
pieces of sound files for overlay.
PROCESSING PROGRAMS 24 LPS
LPS - LINEAR PREDICTION SYNTHESIS
This program takes a MRG file produced by PITCH and PDF (a P file) and
synthesizes a sound file from it. It takes the pitch contour from the P
function in the P file, the pulse train amplitude from the PH function, and
the noise (Gaussian) amplitude from the NH function. In addition, you may
specify a frequency multiplication factor which just scales all the
frequencies involved. If the filter order is too high, this sometimes
produces an objectionable buzzing due to the "beating" of the original pitch
and the altered pitch.
It takes a FUNC file to specify time and frequency warping. If these are not
specified, unity is taken for both. For example, if the time warping is set to
2.0 throughout, then the resulting sound will be exactly twice as long as the
original.
To call:
R LPS
Input K file: <type name of K file from FLTCMP>
Function name for K parameters: <usually the same as the K file name>
Maximum number of coefficients: <C.R. gets all of them>
Input P file: <type name of P file from PITCH and PDF>
Function name for P parameters: <probably same as file name>
Output SND file: <make up a file name>
Warp function file (CR to finish): <type FUNC file name or CR>
<prints out names of functions in file, returns to above question>
Time warp function: <type name of function or C.R. for no time warp>
Frequency warp function: <type name of function or C.R. for no freq warp>
All voiced? <Y suppresses all frication>
All unvoiced? <If answer above was CR, Y makes whispered speech>
There exists a gain function. Use it? <Usually no>
Just use gain term? <Usually Yes>
PROCESSING PROGRAMS 25 LPS
This business at the end determines how the energy is normallized. There is
period-by-period exact normalization, and there is dead-reconing
normalization. The exact normalization is the most precise, but doesn't always
work because the filter doesn't always exactly model the speech. Probably the
smoothest and safest is to not use the gain function but just use the gain
term. This causes the criterion to be the actuall prediction error (rather
than the computed error term you get from the autocorrelation method) and
causes the dead-rekoning method to be used. This is both fast and smooth, but
can sometimes produce unnatural output strengths.
Some tips about linear prediction in general. First, the sounds are much
better if you put some reverberation on them. That takes the knife-edged
tonality out. Next, the algorithm is quirky. Sometimes it doesn't work right
and sometimes it does, and there is little way to tell when. You just have to
experiment with it. Sometimes the same passage just said differently will
produce better results. If you can't get good results for some speech, skip it
and try to find a better one. You can improve the voiced-unvoiced decisions (a
necessary step) by editing the functions in the .P file. Remember, the speech
sounds much better in context.
For the order of the filter, you can use the rule of thumb of 45 coefficients
for deep male voices and 25 for sopranos and in between for in between.
PROCESSING PROGRAMS 26 ENORM
ENORM - ENERGY NORMALIZATION
This program normalizes the energy of a sound file processed by FLTAPP. It
does this by comparing the energy to the original energy as recorded in the K
file as function RMS. You run it as follows:
R ENORM
Input FLT file: <sound file name to be normalized>
Input K file: <K file from FLTCMP>
Function name for K parameters: <probably same as file name>
Output SND file: <file name for output 12-bit file>
Time expansion ratio: <C.R. or number, like 1 or 1.5>
If you used a time expansion ratio in FLTAPP, then you ought to use the
same number here. It is not necessary for LPS because that already
does energy normalization. This program can also be used to just put the
envelope of one sound (from the K file) on another. Just run off a K
file at some very small order, like 2 or 4, for the sound whose envelope
you want to track, then you can apply that to any other sound file.
PROCESSING PROGRAMS 27 FLTAPP
FLTAPP - APPLY LINEAR PREDICTION FILTER
This program applies a filter (as computed by FLTCMP) to a sound file, either
in the inverse form or the forward form. The inverse form will remove
resonances ("whiten" the file). The forward form will impose resonances. For
instance, to do Petersen-style cross-synthesis, you would produce an
excitation source by taking some sound file, computing the filter for it with
FLTCMP, then filtering it with the inverse filter using FLTAPP. This
excitation source can then be filtered by the forward filter from another
sound file. In this manner, we can impose the spectral shape of, say, speech,
on the pitch and articulation contour of some other sound file.
R FLTAPP
Input SND file: <type file name>
Input K file: <type K file name>
Output SND file: <type file name>
Number of coefficients: <C.R. or number>
Do you want the inverse filter? <Y or N, depending>
Starting time in sound file: <C.R. or time in seconds>
Time expansion ratio: <C.R. or factor, like 1.5 or 2>
This program produces a floating point file, so you have to run it through
either S or ENORM to convert it to 12-bit fixed point.
The only way to really understand the effect this program has is to run it on
a few things and listen to the results.
The starting time in the sound file is when, for instance, you are applying
speech formants to a sound file, but you aren't applying it to the entire
sound file, but only to some small portion. In this case, you select the
beginning time of the portion with this starting time and you select the
duration with the time expansion ratio. This will expand (or shrink) the
duration of the speech data. >1 expands, <1 shrinks.
PROCESSING PROGRAMS 28 FLTAPP
To do real cross-synthesis, usually the process is to take an instrument tone
and whiten it by filtering it with its own inverse filter of low order (4 or
6). Take this whitened sound and then filter with the forward filter for the
speech sound that you are imposing (usually of high order like 35 or 45). The
whitening improves the intelligibility of the speech immensely. You can get
all degrees between the instrument and the voice by increasing the whitening
and increasing the order of the vocal filter. Things that are already white
(like cymbol crashes or ocean sounds) probably don't need much inverse
filtering. This all works much better if the instrumental and voice sounds are
exactly matched, like done at the same time. For instance, you can take a
speech sound, then using our marvelous simultaneous play and record feature in
ADUDP, record a sax line that matches the speech. You might want to produce
the speech at different durations also, like 2X or 4X. You can do this with
LPS, which will not give ultimate quality, but will give you some speech that
you can sync the sax with. This matched instrument and voice seems to give the
best results.
PROCESSING PROGRAMS 29 SRCONV
SRCONV - CHANGE SAMPLING RATES
This program will change the sampling rate of a file by any integral ratio,
like 2/1, 3/2, 201/200, and the like. This is like speeding up or slowing
down the tape. The duration will be changed along with the pitch. The output
file is in 36-bit floating point and must be converted to integer by either S
or ENORM or something.
The program asks you for two numbers. These are the numerator and the
denominator of a ratio. The denominator refers to the output file and the
numerator refers to the input file. So, for instance, to take every other
sample, or effectively double the sampling rate, you should type 1/2
(numerator of 1, denominator of 2). This will give you a file half as long.
R SRCOMV
Input SND file: <input file here>
Output SND file: <output file name here>
Numerator: <Corresponds to input file>
Denominator: <Corresponds to output file>
Note that by use of any pitch changing method (such as DFSYN or LPS) and this
routine, you can effectively change the duration of the file without changing
the pitch. For instance, if you change the pitch by a factor of 2 (put it up
an octave), then change the sampling rate by a factor of 2/1 (doubles the
number of samples), then you get the original pitch back, but the duration is
now doubled. This isn't really the best way to do this. The duration change
should actually be worked into the synthesis routines.
PROCESSING PROGRAMS 30 HEADER
HEADER - PUT HEADER ON SOUND FILE, OR EDIT EXISTING HEADER
This program allows you to put a header on a sound file that doesn't already
have one. It also allows you to alter the header if one aready exists. It is
pretty self-explainatory.
DISPLAY PROGRAMS 31 INTRODUCTION
DISPLAY PROGRAMS
This section is sort of a catch-all for programs that don't do any analysis.
These are mostly just programs that examine MRG files, since MRG files are a
bit opaque in general.
There is EXMRG, which can examing any function in any MRG file. It is capable
of looking at only a portion of a function in a MRG file also.
AFPIX assumes that the MRG file has been produced by DFSYN and pairs the
amplitude and frequency functions into one picture. This one runs straight
through and produces pictures (and plot files) of all the functions without
you saying a thing.
DISPLAY PROGRAMS 32 EXMRG
EXMRG - EXAMINE MERGE FILES
This program displays functions in MRG files.
R EXMRG
MRG file name: <type in file name here>
<it may type out the names of the functions now>
Function name: <type desired function name>
Beginning time: <C.R. for beginning of function>
Ending time: <C.R. for all the way to the end>
<it displays the function now>
<type C.R. to display another function>
Each time it does a display, it writes out another plot file. It starts with
the file name MRG1.PLT and increments the number each plot. Thus, using the S
program ($D command), you can look at the plot files.
DISPLAY PROGRAMS 33 AFPIX
AFPIX - DISPLAY MAGNITUDE AND FREQUENCY FUNCTIONS
For MF files that were produced by DFSYN, this program displays some number of
channels worth, putting a single channel, amplitude and frequency, on each
display. This writes out a series of PLT files also.
R AFPIX
Input MF file: <Type name of file>
Function name: <Probably same as file name>
Plot only? <If you don't want display, just plot files, type Y>
Starting channel: <1 is a good number>
Ending channel: <as high as you want to see>
Count by: <probably want 1 here>
It will type out each file name as it writes it out on the disk.
EDITING PROGRAMS 34 INTRODUCTION
EDITING PROGRAMS
We have already mentioned S, which is capable of doing some minor editing.
Here we will mention REVED, the reverberation editor, EDSND, the sound file
editor, and FUNED, the function editor. EDSND allows a user to break up a
sound file into segments either by thresholding the energy of the signal, or
by visual inspection of the file, to hear selected segments of the file in any
order, and to copy the selected segments in any order to an output file. REVED
allows manipulation of reverberation parameters and display of the impulse
response of the resulting reverberator. This seems to be the only convenient
way to design a reverberator. FUNED allows editing of SEG-type functions, as
well as conversion between MRG format and SEG format.
EDITING PROGRAMS 35 FUNED
FUNED - PIECEWISE-LINEAR FUNCTION EDITOR
When the program is first entered, it prints out a list of the functions. It
then asks "Command or Function to modify". You then type (in full) either a
command or a function name. If you type a function name, the function editor
(EDFUN) will be called with that function as an argument. If there isn't a
function of the name that you typed, the name will be examined for whether it
is a recognized command. The following commands are interpreted:
EXIT - Leave FUNED. <ALT> also works.
PRINT - Prints out the names of the functions.
INPUT - Input a SEG-type function file. Asks for a file name and appends the
new function list onto the old one. You then have a longer function list. If
any of the new functions have the same name as the old ones, you have a little
bit of a problem, of course. Any search for that function name will just grab
the first one in the list. You can, however, rename them.
WRITE - Asks for a file name and writes out the entire function list on a
file.
DELETE - Asks for a function name and deletes it from the list.
RENAME - Asks for an old function and a new function name and renames it.
MERGE - This interfaces with the merge-file routines. It asks you for a merge
file name. If there is a merge file by that name out there, it prints a list
of the names of all the functions therein. It then asks you for a merge-file
function name. If you type <CR>, you will exit this command and get back to
the "Command or Function to modify" prompt. If the merge-file function name
you typed can be found, it then asks you "Straight or Piecewise-Linear?". If
you respond "S" to this, it just takes the merge file function and makes it
EDITING PROGRAMS 36 FUNED
into a SEG function without any loss of data. That is, there will be as many
breakpoints in the functions as there are points in the original function. It
will ask you for a new name for the SEG function. If you would like some data
reduction, you should respond "P". You will then get into another little
command loop that can accept four different commands: <CR> adds another line
segment to the approximation, "↑" subtracts a segment from the approximation,
"Y" stops here and makes up a record corresponding to the selected
approximation, and <ALT> aborts the process and forgets about it. If you get
through "Y", it asks you for a new SEG function name and calls the new record
that. It then goes back to asking you for another merge-file function name.
<CR> at that point gets you back to the main command loop.
There are plans in the works for this routine to do function plotting, other
forms of display, and limited sound-file analysis too, and anything else
anyone feels would be useful.
If you just type the name of a function, rather than any of the above special
commands, you get into EDFUN, then function editor. This is a minimal
function editor. It allows hand-modification of SEG-type functions. When the
routine if first entered, it displays three basic things: the function itself
with horizontal and vertical axes, a list of the breakpoints (or at least as
many as it can get on the screen), and a "cursor", which looks for all the
world like a great big sharp sign. This cursor surrounds the "current"
breakpoint, which is initialized to be the first breakpoint.
This program accepts simple commands that consist of a repeat argument and an
activation character. If the repeat argument is missing, it is taken to be
one. If included, it specifies the number of times the command is to be
repeated. The special symbols "∞" or "*" stand for the number of breakpoints
in the function. The activation character is a single character with <CONTROL>
(hereafter abbreviated as "α"), or <META> (hereafter abbreviated as "β") on.
Some commands accept a binary scale on the repeat argument coded in the
control, meta, and sometimes top bits. This will be denoted as "λ" below.
EDITING PROGRAMS 37 FUNED
(Note: at IRCAM right now, the effect of α and β (control and meta keys) must
be simulated with ALTs. One ALT means α, two alts mean β, and three alts mean
αβ. I haven't figured out yet how to get a single ALT in as a character,
though. Hmmm.)
First, to get out of the editor, you type αE, which updates your record and
exits. If you want to get out without updating the record, type <ALT>. If you
want to update the record without exiting, type α. (that's right,
<CONTROL><PERIOD>). To reset the display from whatever is in the record, type
αβ<BS> or αβO (sort of like αXCANCEL in E). If you are doing any hairy
complicated change, you will probably want to α. a lot.
To make any modification, you move to the breakpoint you want to change, and
you change it. Breakpoint moving commands are "←" and "→" for move left and
right. Breakpoint changing commands are many. Each time you move a breakpoint,
the values at that point are saved away, and you can get them back with αR
(restore). This allows you to make any stupid move and as long as you haven't
moved to a different breakpoint, you can reset the value to what it was
before.
There is αD which deletes the current breakpoint. A repeat argument means to
delete that many breakpoints. If you ask to delete more than 4, it will ask
you to confirm the massive delete.
Also, εI invents a new breakpoint and puts it on top of the old one. A repeat
argument creats that many new breakpoints. You can then move them around to
wherever you want.
εB breaks the line segment after the current breakpoint into the number of
pieces specified by the repeat argument. It will look for all the world that
nothing has happened, because what it does is to slide down the line segment
and insert that many new breakpoints along the line. You can then move through
and change them to wherever you please.
EDITING PROGRAMS 38 FUNED
εC is to type in a new value for the breakpoint. A repeat argument says to do
it to that many consecutive breakpoints. It loads the current value of the X
and then Y coordinates of the breakpoints into your line editor. You should
type <CR> if you don't want to change them, or edit them however you wish. The
only problem here is that you have to be sure and wait for the line editor to
get loaded (Hic!) before typing <CR>, else the program will get very confused.
At IRCAM, it doesn't (yet) load your line editor.
To move breakpoints, you use the commands λ(, λ), λ/ or λ\. To figure out what
this is about, put your four fingers (thumb excluded) over the keys ()/ and \.
They then are left, right, up, and down, with the strength specified by the
control, meta, and top keys, presumably operated with the left hand. A repeat
argument specifies more strength in the motion. If the minimum strength is too
much, you can change it with the α* and α⊗ commands. These cause the strength
to be halved or doubled respectively. The strength is initially set to 8.0,
and is in terms of raster units on the screen. α( will move the breakpoint to
the left by eight raster units (unless you change the scale from its default
of 8). αβ[ will move the breakpoint to the left by 512 raster units (I
think!).
αN is used to normalize a function to be like a classical FUNC SEG function.
That is, it normalizes the horizontal axis to go from 1 to 100, and the
vertical axis to go from 0 to 1. If you are using this editor to set up stuff
for the music program, you probably ought to normalize the function sooner or
later.
That's about it. The key commands are εE to get out, α→ and α← to move to
different breakpoints, the ()/\ commands for changing breakpoints, as well as
αC for reading in new coordinates, the αI and αB commands for making up new
breakpoints, and αD for getting rid of breakpoints. What could be simpler?